A Language for Nested Data Parallel Design-space Exploration on GPUs

نویسندگان

  • Bo Joel Svensson
  • Mary Sheeran
  • Ryan R. Newton
چکیده

Graphics Processing Units (GPUs) o er potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant kernel performance. To that end Obsidian includes guaranteed elimination of intermediate arrays and predictable space/time costs, while also providing array functions that are polymorphic across di erent levels of the GPUs' hierarchical structure, providing a limited form of nested data parallelism. We walk through case-studies that demonstrate how to use Obsidian for rapid design exploration or auto-tuning, resulting in better performance than hand-tuned kernels in an existing GPU language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A language for hierarchical data parallel design-space exploration on GPUs

Graphics Processing Units (GPUs) offer potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant to kernel pe...

متن کامل

Design Space Exploration for GPU-Based Architecture

Recent advances in Graphics Processing Units (GPUs) provide opportunities to exploit GPUs for non-graphics applications. Scientific computation is inherently parallel, which is a good candidate to utilize the computing power of GPUs. This report investigates QR factorization, which is an important building block of scientific computation. We analyze different mapping mtheods of QR factorization...

متن کامل

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

Functional programming for nested data parallelism on GPUs

Recent advances in general purpose GPU computing technology allow new data parallel kernel jobs to be dispatched dynamically during kernel execution. This enables significantly more expressive programming using nested data parallelism (NDP), where the restrictive need for flat data structures and computation has been lifted. Functional programming is fundamentally well suited for expressing dat...

متن کامل

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014